Automatic differentiation
Automatic differentiation is provided by the callable operator ∂.
For a function f and arguments args..., ∂{N}(f, args...) computes the Nth-order partial derivative of f with respect to args.... ∂(f, args...) is equivalent to ∂{1}(f, args...).
The basic usage is:
∂{N}(f, args...)returns only the highest-order derivative.∂{N}(f, args..., :all)returns all derivatives up to orderN, together with the function value.
When pseudo keyword :all is given, the return value is ordered from higher to lower order, ending with the function value:
(∂{N}(f, args...), ..., ∂{2}(f, args...), ∂(f, args...), f(args...))For multiple inputs, mixed partial derivatives are grouped by input blocks. If f returns a tuple, each component is differentiated separately.
The user must provide the appropriate tensor symmetry information; otherwise, automatic differentiation may not preserve the expected tensor symmetry. In the following example, even with identical tensor values, the results differ depending on the Tensor type.
julia> A = rand(Mat{3,3})3×3 Tensor{Tuple{3, 3}, Float64, 2, 9}: 0.325977 0.894245 0.953125 0.549051 0.353112 0.795547 0.218587 0.394255 0.49425julia> S = A * A' # symmetric in value, but not typed as `SymmetricSecondOrderTensor`3×3 Tensor{Tuple{3, 3}, Float64, 2, 9}: 1.81438 1.253 0.894897 1.253 1.05904 0.65243 0.894897 0.65243 0.4475julia> ∂(identity, S) ≈ one(FourthOrderTensor{3})truejulia> ∂(identity, symmetric(S)) ≈ one(SymmetricFourthOrderTensor{3})true
Tensorial.∂ — Type
∂{N}An operator representing the Nth-order partial derivative.
∂{N} is callable. Applying it to a function f and its arguments computes Nth-order partial derivatives of f by automatic differentiation. ∂(f, args...) is equivalent to ∂{1}(f, args...).
If pseudo keyword :all is given as the last argument, derivatives of all orders up to N are returned together with the function value. For example, ∂{N}(f, x, :all) returns (∂{N}(f,x), ..., ∂{2}(f,x), ∂{1}(f,x), f(x)).
Basic examples
We begin with the simplest case: a scalar-valued function of a single scalar variable.
julia> x = 2.02.0julia> ∂(x -> x^3, x)12.0julia> ∂(x -> x^3, x, :all)(12.0, 8.0)
Here, ∂(x -> x^3, x) returns only the first derivative, while ∂(x -> x^3, x, :all) returns
(∂f/∂x, f(x))Higher-order derivatives are obtained by specifying the order in braces.
julia> ∂{2}(x -> x^3, x)12.0julia> ∂{2}(x -> x^3, x, :all)(12.0, 12.0, 8.0)
In this case,
∂{2}(x -> x^3, x, :all)returns
(∂²f/∂x², ∂f/∂x, f(x))The same interface also works for tensor inputs.
julia> a = rand(Vec{2})2-element Vec{2, Float64}: 0.7484150218874741 0.5782319465613976julia> ∂(norm, a)2-element Vec{2, Float64}: 0.7913304024086155 0.6113887423103394julia> ∂(norm, a, :all)([0.7913304024086155, 0.6113887423103394], 0.9457680630107)julia> ∂{2}(norm, a)2×2 SymmetricSecondOrderTensor{2, Float64, 3}: 0.39523 -0.511553 -0.511553 0.662111julia> ∂{2}(norm, a, :all)([0.3952302988894536 -0.5115530100904504; -0.5115530100904504 0.6621113888988412], [0.7913304024086156, 0.6113887423103395], 0.9457680630106999)
Multiple inputs
For multiple inputs, the first-order derivative is returned as a tuple whose entries follow the order of the inputs.
julia> ∂((x, y) -> x^2 + 3x*y + y^2, 2.0, 4.0)(16.0, 14.0)julia> ∂((x, y) -> x^2 + 3x*y + y^2, 2.0, 4.0, :all)((16.0, 14.0), 44.0)
The first result is interpreted as
(∂f/∂x, ∂f/∂y)and the second as
((∂f/∂x, ∂f/∂y), f(x, y))Second-order derivatives for multiple inputs are returned as a block Hessian.
julia> ∂{2}((x, y) -> x^2 + x*y + y^3, 2.0, 3.0)((2.0, 1.0), (1.0, 18.0))julia> ∂{2}((x, y) -> x^2 + x*y + y^3, 2.0, 3.0, :all)(((2.0, 1.0), (1.0, 18.0)), (7.0, 29.0), 37.0)
The first result is interpreted as
(
(∂²f/∂x², ∂²f/∂x∂y),
(∂²f/∂y∂x, ∂²f/∂y²),
)and the second as
(
(
(∂²f/∂x², ∂²f/∂x∂y),
(∂²f/∂y∂x, ∂²f/∂y²),
),
(∂f/∂x, ∂f/∂y),
f(x, y),
)The same block structure is used even when the input types differ.
julia> x = 2.02.0julia> A = rand(SymmetricSecondOrderTensor{2})2×2 SymmetricSecondOrderTensor{2, Float64, 3}: 0.727935 0.00744801 0.00744801 0.199377julia> ∂((x, A) -> x * tr(A), x, A)(0.9273116153257611, [2.0 0.0; 0.0 2.0])julia> ∂((x, A) -> x * tr(A), x, A, :all)((0.9273116153257611, [2.0 0.0; 0.0 2.0]), 1.8546232306515222)julia> ∂{2}((x, A) -> x * tr(A), x, A)((0.0, [1.0 0.0; 0.0 1.0]), ([1.0 0.0; 0.0 1.0], [0.0 0.0; 0.0 0.0;;; 0.0 0.0; 0.0 0.0;;;; 0.0 0.0; 0.0 0.0;;; 0.0 0.0; 0.0 0.0]))julia> ∂{2}((x, A) -> x * tr(A), x, A, :all)(((0.0, [1.0 0.0; 0.0 1.0]), ([1.0 0.0; 0.0 1.0], [0.0 0.0; 0.0 0.0;;; 0.0 0.0; 0.0 0.0;;;; 0.0 0.0; 0.0 0.0;;; 0.0 0.0; 0.0 0.0])), (0.9273116153257611, [2.0 0.0; 0.0 2.0]), 1.8546232306515222)
Multiple outputs
If f returns a tuple, each component is differentiated separately. The outer tuple follows the outputs.
julia> ∂(x -> (x^2, x^3), 2.0)(4.0, 12.0)julia> ∂(x -> (x^2, x^3), 2.0, :all)((4.0, 12.0), (4.0, 8.0))
The first result is interpreted as
(∂f₁/∂x, ∂f₂/∂x)and the second as
((∂f₁/∂x, ∂f₂/∂x), (f₁(x), f₂(x)))Second-order derivatives are handled in the same way.
julia> ∂{2}(x -> (x^2, x^3), 2.0)(2.0, 12.0)julia> ∂{2}(x -> (x^2, x^3), 2.0, :all)((2.0, 12.0), (4.0, 12.0), (4.0, 8.0))
The first result is interpreted as
(∂²f₁/∂x², ∂²f₂/∂x²)and the second as
(
(∂²f₁/∂x², ∂²f₂/∂x²),
(∂f₁/∂x, ∂f₂/∂x),
(f₁(x), f₂(x)),
)Multiple inputs and multiple outputs
When there are both multiple inputs and multiple outputs, the outer tuple follows the outputs, and the inner tuple follows the inputs.
julia> ∂((x, y) -> (x + y, x * y), 2.0, 3.0)((1.0, 1.0), (3.0, 2.0))julia> ∂((x, y) -> (x + y, x * y), 2.0, 3.0, :all)(((1.0, 1.0), (3.0, 2.0)), (5.0, 6.0))
The first result is interpreted as
(
(∂f₁/∂x, ∂f₁/∂y),
(∂f₂/∂x, ∂f₂/∂y),
)and the second as
(
(
(∂f₁/∂x, ∂f₁/∂y),
(∂f₂/∂x, ∂f₂/∂y),
),
(f₁(x, y), f₂(x, y)),
)For second-order derivatives, each output carries its own block Hessian.
julia> ∂{2}((x, y) -> (x + y, x * y), 2.0, 3.0)(((0.0, 0.0), (0.0, 0.0)), ((0.0, 1.0), (1.0, 0.0)))julia> ∂{2}((x, y) -> (x + y, x * y), 2.0, 3.0, :all)((((0.0, 0.0), (0.0, 0.0)), ((0.0, 1.0), (1.0, 0.0))), ((1.0, 1.0), (3.0, 2.0)), (5.0, 6.0))
The first result is interpreted as
(
(
(∂²f₁/∂x², ∂²f₁/∂x∂y),
(∂²f₁/∂y∂x, ∂²f₁/∂y²),
),
(
(∂²f₂/∂x², ∂²f₂/∂x∂y),
(∂²f₂/∂y∂x, ∂²f₂/∂y²),
),
)and the second as
(
(
(
(∂²f₁/∂x², ∂²f₁/∂x∂y),
(∂²f₁/∂y∂x, ∂²f₁/∂y²),
),
(
(∂²f₂/∂x², ∂²f₂/∂x∂y),
(∂²f₂/∂y∂x, ∂²f₂/∂y²),
),
),
(
(∂f₁/∂x, ∂f₁/∂y),
(∂f₂/∂x, ∂f₂/∂y),
),
(f₁(x, y), f₂(x, y)),
)Aliases
gradient and hessian are aliases for first- and second-order partial derivatives:
Tensorial.gradient — Type
const gradient = ∂{1}Tensorial.hessian — Type
const hessian = ∂{2}